SIMD high performace xorcpy in C

In my current project I need a high performace xorcpy implemented in C. Here is my implementation to share and it is tested on my Xeon E5-2680 PC, it reaches almost 1.7GB/s. #include <emmintrin.h> … __forceinline unsigned char* xorcpy(unsigned char* dst, const unsigned char* src, unsigned block_size) { // Do the bulk of the copy … Continue reading SIMD high performace xorcpy in C